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ABSTRACT 


Sign language recognition System is one of the systems that have 
major use for the peoples who are deaf\dumb. With the development 
of this system, we can provide such kind of peoples, a medium to 
communicate with peoples and their family member. As we all know 
deaf\dumb peoples are very far from the mainstream, such kind of 
person don’t have proper job and proper livelihood. They spent their 
whole life in learning sign languages, that are not understandable for 
anormal people. Here sign languages detection system plays a major 
role by providing a platform between deaf\dumb peoples and normal 
people, so that they can communicate with each other. Sign language 
detection systems can be setup at schools, hospitals, hotels, malls etc. 
which will make it very simple for such peoples to communicate. 
Hand gestures is easiest way of nonverbal communication which 
plays vital role in daily life. The propped paper provides a user- 
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friendly way of communication with the help of CNN algorithm. 
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1. INTRODUCTION 

Sign Language Detection System is one the important 
kind of system in today’s world, as we all are 
growing and developing our skills on daily basis with 
advanced technologies. The peoples. who are 
deaf\dumb are unable to express their feeling, as they 
are using old sign language techniques to 
communicate with normal peoples who don’t even 
understand the sign language. 


In the new era of technology there must be some 
focus on developing Sign languages detection 
systems that can be used by deaf\dumb peoples to 
express their thought. This sign language detector will 
work as a mediator between deaf\dumb peoples and 
normal peoples to translate sign languages to 
alphabets. In this research paper we are working on 
developing such kind of system, we are developing 
this system with the help of Python, TensorFlow, 
Keras, OpenCV which is available free of cost. In this 
paper we are using 26 English alphabets and every 
alphabet represents a specific sign. 


To develop such kind of systems we need to setup the 
environment by selecting the developing IDE 
(PyCharm), installing different modules and 
supporting files. 
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After the development of sign language detector 
system, we can deploy it on cloud-based platforms 
such as Amazon web services so that is will always 
available online with zero downtime and without data 
loss and it will freely available and easy to access 
with the help of browsers. We can use S3 Bucket for 
storing dataset and EC2 for deploying our system 
over cloud. 


2. Literature Review 

Literature review of our proposed system shows that 
there has been many research done on the sign 
language detection in videos and images using several 
methods and algorithms. 


The paper by M. Geetha and U. C. Manjusha[7], 
make use of 50 specimens of every alphabets and 
digits in a vision based recognition of Indian Sign 
Language characters and numerals using B-Spine 
approximations. The region of interest of the sign 
gesture is analysed and the boundary is removed. The 
boundary obtained is further transformed to a B- 
spline curve by using the Maximum Curvature Points 
(MCPs) as the Control points. The B-spline curve 
undergoes a series of smoothening process so features 
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can be extracted. Support vector machine is used to 
classify the images and the accuracy is 90.00%. 


Paper done by Rekha, J[5]. which made use of 
YCbCr skin model to detect and fragment the skin 
region of the hand gestures. Using Principal 
Curvature based Region Detector, the image features 
are extracted and classified with Multi class SVM, 
DTW and non-linear KNN. A dataset of 23 Indian 
Sign Language static alphabet signs were used for 
training and 25 videos for testing. The experimental 
result obtained were 94.4%for static and 86.4% for 
dynamic. 


Siming He[4] proposed a system having a dataset of 
40 common words and 10,000 sign language images. 
To locate the hand regions in the video frame, Faster 
R-CNN with an embedded RPN module is used. It 
improves performance in terms of accuracy. 
Detection and template classification can be done ata 
higher speed as compared to single stage target 
detection algorithm such as YOLO. The detection 
accuracy of Faster R-CNN in the paper increases 
from 89.0% to 91.7% as compared to Fast-RCNN. A 
3D CNN is used for feature extraction and a sign- 
language recognition framework consisting of long 
and short-time memory (LSTM) coding and decoding 
network are built for the language image sequences. 
On the problem of RGB sign language image or video 
recognition in practical problems, the paper merges 
the hand locating network, 3D CNN feature 
extraction network and LSTM encoding and decoding 
to construct the algorithm for extraction. This paper 
has achieved a recognition of 99% in common 
vocabulary dataset. 


3. Dataset 

Datasets are collection of data that can be used to 
perform gesture recognition. Every alphabet that is 
assigned with specific gesture has a multiple set 
images. Like Alphabet A has set of 1750 gestures and 
similarly other alphabets also have 1750 gesture per 
alphabets. So total numbers of dataset are 1750*26 
which is a huge dataset. The accuracy of sign 
detection will be more if the datasets size is huge. 


Mentioned below are the dataset used 
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Fig 1 Dataset 


4. Architecture Diagram 

The diagram mentioned below is for the proposed 
system. At first step skin colour and tone is 
recognized and segmentation is performed on that 
image. This process broken down the digital image 
into subgroups called image segments and converts 
the colour image into black and white image. 


Pattern recognition is the use of machine learning 
algorithms to identify patterns and _ their 
representation. With the help of this labeled training 
data is used to train pattern recognition systems. A 
label is attached to a specific input value that is used 
to produce a pattern-based output. In the absence of 
labeled data, other computer algorithms may be 
employed to find unknown patterns. 
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Pattern recognition is achieved by utilizing the 
concept of learning. Learning enables the pattern 
recognition system to be trained and to become 
adaptable to provide more accurate results. A section 
of the dataset is used for training the system while the 
rest is used for testing it. 


The following image shows how data is used for 
training and testing. 


<== To build the system 


TRAINING 
DATA 


To check the 
TESTING >» cmc of the 
Fig. 2 Training and testing 


Gesture Pose Identification identifies the hand sign 
and gets the respective alphabet that is assigned to the 
particular sign. 


Pattern 
Segmentation |——» — >) Recognition 
algorithm 


Segmented Image 


Ch t Alphabet Gesture Pose 
Identification 
Fig. 3 Architecture Diagram of Proposed System 


5. Algorithms 

A. Region of interest extraction 

Regions of interest (ROI) means the meaningful and 
important regions of images. It is region or 
rectangular box that appears while recognizing sign 
where we put our hand symbol inside that box, the 
image that comes under that box is ROI. 


Hand Gesture Pose 


Fig 4 Region of Interest (ROD 


B. Convolutional Neural Network (CNN) 
Convolutional Neural Network is a type of artificial 
neural network used in image recognition and 
processing that is specifically designed to process 
pixel data. 


Seevabeberaal bayer (51) 


C. Convolutional Layer 

A convolutional layer is the main building block of a 
CNN. It contains a set of filters (or kernels), 
parameters of which are to be learned throughout the 
training. The size of the filters is usually smaller than 
the actual image. Each filter convolves with the 
image and creates an activation map. 


D. Relu Layer 

The rectified linear activation function or ReLU for 
short is a piecewise linear function that will output 
the input directly if it is positive, otherwise, it will 
output zero. 


E. Pooling Layer 

Pooling layers are used to reduce the dimensions of 
the feature maps. Thus, it reduces the number of 
parameters to learn and the amount of computation 
performed in the network. The pooling layer 
summarises the features present in a region of the 
feature map generated by a convolution layer. 


F. Fully-connected Layer 

Fully Connected Layer is simply, feed forward neural 
networks. Fully Connected Layers form the last few 
layers in the network. The input to the fully connected 
layer is the output from the final Pooling or 
Convolutional Layer, which is flattened and then fed 
into the fully connected layer 


6. Result 

We are successfully able to develop sign language 
detector using cloud, only part left is deployment of 
project on cloud platform and storing dataset on S3 
bucket. Due to lack of resources unable to deploy sign 
language detector on cloud platform. 


We are able to create dataset with good clarity after 
adjusting the histogram and recognition of alphabets 
are working as expected, overall goal has been 
achieved. 
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Fig. 6 Image and Converted Image 


Sample proof of sign language detector is mentioned 
below while recognizing alphabet A. 


Fig. 7 Result 


7. Future Scope 

The sign language detector using cloud future scope 
will be to deploy it on cloud-based platform at first 
priority. There are some important changes that is 
required which can be developed later, like writing of 
words and sentences by showing gestures will make 
the system easier to use and provides an advanced 
sign language detection system. 


My future goal is to modify to project to make it more 
user friendly and increase the dataset size to get more 
accurate results. 


Also, there is need to add numbers in our gesture 
datasets. 


8. Conclusion 

Sign language detector using cloud is a difficult 
problem if we review all the set of gestures that a 
system of this type use to translate. The best way to 
solve such kind of problem is to divide problem in 
simpler forms, and here we used simpler form by 
developing project with different smaller modules. 


The system is able to perform in a good manner, the 
only issue we face is that while setting up histogram 
for recognizing gestures we need a wall in 
background which is not a disturbing background and 
adequate light is required, more than enough light 
will cause difficulty. 


It is observed that for some alphabets recognition 
time takes is less and for some other it takes more 


time, as we can notice that some of the alphabets have 
similar gestures. 
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